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1 . This international preliminary exannination report has been prepared by this International Preliminary Examining Authority 
and is transmitted to the applicant according to Article 36. 

2. This REPORT consists of a total of 5 sheets, including this cover sheet. 

Kl this report is also accompanied by ANNEXES, I.e. sheets of the description, claims and/or drawings which have 
been amended and are the basis for this report and/or sheets containing rectifications made before this Authority 
(see Rule 70.16 and Section 607 of the Administrative Instructions under the PCT). 

These annexes consist of a total of 6 sheets. 



3. This report contains indications relating to the following Items: 



Basis of the report 



II 
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III 


□ 


IV 
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V 




VI 
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VII 
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VIII 





Reasoned statement under Article 35(2) with regard to novelty, Inventive step or industrial applicability; 
citations and explanations suporting such statement 



Certain observations on the international application 
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Applicant's or agent's file reference 
PP/3253 POT 


IMPORTA^f^ NOTIFICATTON 


International application No. 
PCT/GB99/03081 


Interr^ational filing date (day/month/year) 
14/09/1999 


Priority date (day/month/year) 
14/09/1998 


Applicant 

ASTON UNIVERSITY et al. 



1. The applicant is hereby notified that this International Preliminary Examining Authority transmits herewith the 
international preliminary examination report and its annexes, if any, established on the intemational application. 



2. A copy of the report and its annexes, if any, is being transmitted to the Intemational Bureau for communication 
to all the elected Offices. 



3. Where required by any of the elected Offices, the Intemational Bureau will prepare an English translation of the 
report (but not of any annexes) and will transmit such translation to those Offices. 



4. REMINDER 

The applicant must enter the national phase before each elected Office by performing certain acts (filing 
translations and paying national fees) within 30 months from the priority date (or later in some Offices) (Article 
39(1)) (see also the reminder sent by the International Bureau with Form PCT/IB/301). 

Where a translation of the international application must be furnished to an elected Office, that translation must 
contain a translation of any annexes to the intemational preliminary examination report. It is the applicant's 
responsibility to prepare and fumish such translation directly to each elected Office concerned. 

For further details on the applicable time limits and requirements of the elected Offices, see Volume II of the 
POT Applicant's Guide. 
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Tel.+49 89 2399-8102 
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(PCT Article 36 and Rule 70) 



Applicant's or agent's file reference 
PP/3253 PCT 


See Notification of Transmittal of International 
FOR FURTHER ACTION Preliminary Examination Report (Form PCT/IPEA/416) 


International application No. 
PCT/GB99/03081 


International filing date (day/month/year) 
14/09/1999 


Priority date (day/month/year) 
14/09/1 998 


International Patent Ctassincation (IPC) or national classification and IPC 
C12N15/10 


Applicant ' . ' 
ASTON UNIVERSITY et al. 



1. This international prelinninary exannination report has been prepared by tfiis international Preliminary Examining Authority 
and is transmitted to the applicant according to Article 36. 



2. This REPORT consists of a total of 5 sheets, including this cover sheet. 

S This report is also accompanied by ANNEXES, i.e. sheets of the description, claims and/or drawings which have 
been amended and are the basis for this report and/or sheets containing rectifications made before this Authority 
(see Rule 70.16 and Section 607 of the Administrative Instructions under the POT). 

These annexes consist of a total of 6 sheets. 



3. This report contains indications relating to the following items: 

I S Basis of the report 

II □ Priority 

III □ Non-establishment of opinion with regard to novelty, inventive step and industrial applicability 

IV □ Lack of unity of invention 

V S Reasoned statement under Article 35(2) with regard to novelty, inventive step or industrial applicability; 

citations and explanations suporting such statement 

VI □ Certain documents cited 

VII □ Certain defects in the international application 

VIII S Certain observations on the international application 



Date of submission of the demand 
15/03/2000 


Date of completion of this report 
04.12.2000 


Name and mailing address of the international 
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^ European Patent Office 
^) D-80298 Munich 
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INTERNATIONAL PRELIMINARY 
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International application No. PCT/GB99/03081 



1. Basis of the report 

1 . This report has been drawn on the basis of (substitute sheets which have been furnished to the receiving Office in 
response to an invitation under Article 14 are referred to in this report as "originally filed" and are not annexed to 
the report since they do not contain amendments (Rules 70. 16 and 70. 17).): 
Description, pages: 

1-35 as originally filed 

Claims, No.: 

1-27 as received on 21/08/2000 with letter of 18/08/2000 

Drawings, sheets: 

1/2,2/2 as originally filed 



2. With regard to the language, all the elements marked above were available or furnished to this Authority in the 
language in which the international application was filed, unless otherwise indicated under this item. 

These elements were available or fumished to this Authority in the following language: , which is: 

□ the language of a translation furnished for the purposes of the international search (under Rule 23.1 (b)). 

□ the language of publication of the international application (under Rule 48.3(b)). 

□ the language of a translation fumished for the purposes of intemational preliminary examination (under Rule 
55.2 and/or 55.3). 

3. With regard to any nucleotide and/or amino acid sequence disclosed in the intemational application, the 
intemational preliminary examination was carried out on the basis of the sequence listing: 

□ contained in the intemational application in written form. 

□ filed together with the international application in computer readable form. 

□ fumished subsequently to this Authority in written form. 

□ furnished subsequently to this Authority in computer readable form. 

□ The statement that the subsequently fumished written sequence listing does not go beyond the disclosure in 
the international application as filed has been furnished. 

□ The statement that the information recorded in computer readable form is identical to the written sequence 
listing has been furnished. 

4. The amendments have resulted in the cancellation of: 

□ the description, pages: 

□ the claims, Nos.: 
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□ the drawings, sheets: 

5. □ This report has been established as if (some of) the amendments had not been made, since they have been 
considered to go beyond the disclosure as filed (Rule 70.2(c)): 

(Any replacement sheet containing sucfi amendments must be referred to under item 1 and annexed to this 
report.) 



6. Additional observations, if necessary: 



V. Reasoned statement under Artible 35(2) with regard to novelty, inventive step or industrial ap>plicability; 
citations and explanations supporting such statement 

1. Statement 

Novelty (N) Yes: Claims 1-27 

No: Claims 

Inventive step (IS) Yes: Claims 1-27 

No: Claims 

Industrial applicability (lA) Yes: .Claims 1-27 

No: Claims 



2. Citations and explanations 
see separate sheet 



VIM. Certain observations on the international application 

The following observations on the clarity of the claims, description, and drawings or on the question whether the 
claims are fully supported by the description, are made: 
see separate sheet 
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V. Reasoned statement 



Novelty (Article 33(2) PCT) 

The claimed sets of libraries of genes (Claims 1-6, 13) or of proteins (Claims 7-12), 
their use (Claims 14-23), the protein (SEQ. ID. NO: 1) and the gene coding for it 
(Claims 24-25) as well as the method of constructing randomised gene libraries 
(Claims 26-27) are novel over the prior art. 

Inventive step (Article 33(3) PCT) 

The sets of libraries contain between 12 and 40 libraries of a particular design; 
such sets with this design are not derivable from the cited prior art. 
The sets and their use (Claims 1-23) are therefore considered inventive. 

The protein of Claim 24 and the corresponding gene were derived by combining 
features of multiple, known zinc fingers (see page 32), and are also considered 
inventive. 

The method of Claims 26-27 are inventive in the light of the Description, but may 
have to be further defined (as stated under Vlll:3). 

VIII. Certain observations 

1. 

Claims 1 and 7 have been reworded to improve clarity; the present version appears 
acceptable under Article 34(2)(b) PCT. 
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2. 

Amended Claim 24 contains typographical errors; original Claim 24 refers to 
K, H and A at the positions of the square, cross and dollar. 

3. 

Claims 26-27 were initially objected to in view of similarities with prior art methods, 
such as those of Choo and Klug (see page 2 of the Description). 
The Applicant has replied that the present method differs from those of the prior 
art because of certain characteristic features; however, all of these features are 
not present in said claims. 

These claims should be revised in a later phase according to national/regional 
regulations. 

4. 

The statement on page 9, line 24, "more broadly applicable", may be correct 
per se - but has no clear interpretation and does not necessarily mean that the 
method is inventive for every such vague broader meaning. 

5. 

The Description should be amended to correspond to the reworded claims. 
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CLAIMS 



5 1. A set Of libraries of genes which code for proteins which are 

capable of specific binding interactions with a specific binding partner by 
amino acid residues at at least two specified positions including a first 
specified position and at least one other specified position, which set of 
libraries consists of: 

0 a) 6 to 20 libraries in which each library has a triplet that codes 

for at least one but less than 20 amino acids at said first specified position, 
and is randomised at the or each triplet coding for the said at least one 
other specified position, the arrangement being such that interactions of the 
proteins coded for by the said 6 to 20 libraries with a specific binding 

15 partner identifies a triplet that codes for an amino acid at the said first 
specified position that takes part in the specific binding interaction, and 
b) 6 to 20 libraries in each of which libraries said first specified 

position is randomised and a different one of said at least one other 
specified positions has a triplet that codes for at least one but less than 20 

20 amino acids. 

2. The set of libraries of genes as claimed in claim 1 . which set 

of libraries consists of. 

a) 1 2 libraries in which each library has a triplet that codes for 
25 one or several but less than 20 amino acids at the said first detemnined 

position, the triplets being as shown in Table 1 or Table 2. and 

b) 12 libraries of con-esponding design for each of the said one 
■ or more other detennined positions. 

30 3 The set of libraries of genes as claimed in claim 1 or claim 2. 

wherein the genes code for zinc fingers. 
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4. The set of libraries of genes as claimed in claim 3. which set 

consists of 36 libraries in three groups of 12 libraries which code for amino 

acids at the -1 and +3 and +6 positions respectively. 

5 5. Thesetof libraries of genes as claimed in claim 3 or claim 4. 

wherein each gene codes for a protein comprising 3 zinc fingers. 

6. Thesetof libraries of genes as claimed in claim 5, wherein - 

each gene codes for a protein having the sequence (SEQ ID NO: 2) 



10 



TGEKPYKCPECGKSFSXKSXLVXHQRTH 

SXKSXLVXHQRTH 



TGEKPYKCPECGKSF 
15 T G 

where X is any amino acid 



EKPYKCPECGKSFSXKSXLVXHQRTH 



7. A set of libraries of proteins, which proteins are capable of 

20 specific binding interactions with a specified binding partner by amino acid 
residues at at least one specified position including a first specified position 
and at least one other specified position, which set of libraries consists of: 
a) 6 to 20 libraries in which each library has at least one but less 

than 20 amino acid residues at the said first specified position and is 

25 randomised at the said at least one other determined position, the 
arrangement being such that interaction of the 6 to 20 libraries with a 
specific binding partner identifies an amino acid residue at the said first 
specified position that talces part in the specific binding interaction, and 

6 to 20 libraries in each of which libraries said first specified 

30 position is.raMQmls:ed and.a 
other specified position. 
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CONFIRMATION 



8. The set of libraries of proteins as claimed in claim 7. which 

set of libraries consists of 

a) 20 libraries in which each library has one specified amino acid 
residue at the said first determined position and is randomised at the said 

5 one or more other detemnined positions, and 

b) 20 libraries of corresponding design for each of the said one 
or more other determined positions. 

9 The set of libraries of proteins as claimed in claim 7 or claim 

10 8, wherein the proteins are zinc fingers. 

^ 0. The set of libraries of proteins as claimed in claim 7, which 

set consists of 60 libraries in three groups of 20 libraries with specified 
amino acids at the -1 and +3 and +6 positions respectively. 



15 



11. The set of libraries of proteins as claimed In daim 9 or claim 
1 0, wherein each protein comprises three zinc fingers. 

1 2. The set of libraries of proteins as claimed in claim 1 1 , wherein 
20 each protein has the sequence (SEQ ID NO: 2) 

TGEKPYKCPECGKSFSXKSXLVXHQRTH 
TGEKPYKCPECGKSFSXKSXLVXHQRTH 
TGEKPYKCPECGKSFSXKSXLVXHQRTH 

where X is any amino acid 

30 A set ofjibraries of ge^^ for the set of libraries of 

proteins defined in any one of claihns 7 to 1 2. 
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1 4. A method of identifying a protein which interacts with a 

specific binding partner, which method comprises providing a set of 
libraries of proteins as defined in any one of claims 7 to 12, incubating the 
specific binding partner with each library of the set. observing specific 
binding interactions with certain libraries of the set, and using the 
observations to identify a protein which interacts with the specific binding 
partner. 

1 5. The method as claimed in claim 14. wherein the specific 
binding partner is a polynucleotide. 

16. The method as claimed in claim 1 4. wherein the specific 
binding interactions are observed by radiometric or luminescent assay. 

17. The method as claimed in claim 14. wherein the specific 
binding interactions are observed by imaging means. 

18. The method as claimed in claim 14, wherein the specific 
binding interactions are observed by scintillation proximity assay. 

1 9. The method as claimed in claim 18. wherein the sets of 
libraries of proteins are immobilised on scintillation proximity assay 
surfaces and the specific binding partner is radiolabelled. 

20. The method of claim 1 8 or claim 19. wherein after incubation 
the scintillation proximity assay surfaces are washed to distinguish stronger 
specific binding interactions from weaker ones. 
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21 . The method as claimed in claim 1 4. wherein the specific 
binding interactions are observed by colorimetric means. 

22. The method as claimed in claim 21 . wherein the specific 
binding partner is biotinylated and the specific binding interaction is 
detected using a signal generating streptavldin conjugate. 



23. The method as claimed in claim 21 or claim 22 wherein-after 
incubation the binding interactions are washed to distinguish stronger 

10 specific binding interactions from weaker ones. 

24. A protein having the sequence (SEQ ID NO: 1 ) 
KPYKCPECGKSFS. KS*LVSHQRTH 



15 



T G E 

T G E K P Y 
T G E 

20 25. A gene Which codes for the protein of Claim 24 



KCPECGKSFS. KS*LViHQRTH 
KPYKCPECGKSFS. KS*LV$HQRTH. 



25 



30 



26. A method of constructing randomised gene libraries in which 

the number of genes is the same as the number of encoded proteins and 
which contain no termination codons at the predetemnined positions of 
randomisation, the method comprising the steps of: 

a) providing a template oligonucleotide which is fully randomised 
at predetermined codon positions; 

b) for each predetennined codon position providing a pool of 
selection oligonucleotides, wherein each member of said pool contains a 
different cQdon selected from ti^^^^ 
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AAA. AAC, ACC. AGC. ATG. ATT. CAG, CAT. CCG. CGC. CTG. GAA, 
GAT. GCG, GGC. GTG, TAT. TGG. TGC. TTT. 

at the predetermined codon position; 

c) - ^ selecting one or more selection oligonucleotides from each 
pool in order to encode the required gene or library. 

d) allowing the ligated selected oligonucleotides from each pool 
to hybridise with the template oligonucleotide; 

e) forming one or more constructs by ligating the hybridised 
selection oligonucleotides together. 

f) removing a region from a gene of interest con-esponding to 
the hybridised product from step e); 

g) forming a gene library or genes by ligating the products 
from step e) into the said gene of interest wherein the said gene of interest 
Is contained within a suitable expression vector. 

27. A method of producing proteins encoded by the randomised 

gene libraries of claim 26 comprising the steps of: 

a) transfonning a suitable host cell with the gene or gene 
library of claim 27 construct; 

b) expressing the genes to fomn proteins; 

c) purifying the proteins. 
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International appllcaOon Mo. 
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IntBmatlonal flitng date (day/moiimyear) 

14/09/1999 


(Eaillest) Prlortty Date (day/monWyear) 

14/09/1998 


Applicant 

ASTON UNIVERSITY et a1 . 



Tills International Search Report has been prepared by this International Searching Authority and Is transmitted to the applicant 
according to Article 1 6. A copy Is being transmitted to the International Bureau. 

TWs Intematlonal Search Report consists of a tptalof 3 sheets. 

[Xj It Is also accompanied by a copy of each prior art document ctted In this report 



of the report 

a With regard to the language, the Intennatlonal search was carried out on the basis of the International application In the 
language In which It was filed, unless ottierwlse Indicated under this Item. 

I I the Intentatlonal search was canted out on the basis of a translation of the Intennatlonal application furnished to this 
Authority (Rule 23,1 (b)). 

b. With regard to any nucleotide and/or amino acid sequence disclosed In the Intematlonal application, the Interr^tlonal search 
was carried out on the basis of tlie sequence listing : 

I I contained In the International application In wrttten form. 

filed together with the Intematlonal application In computer readable form, 
furnished subsequently to fills Authority In written fonn. 
furnished subsequenfiy to fills Authority In computer readble fbrm. 



□ 



file statement fiiat the subsequenfiy lUmlshed written sequence listing does not go beyond ttie disclosure In the 
Intemafional appHcaUon as filed has been fumlshed. 

file statement fimt the Infomiatlon recorded In computer readable fonn Is Idenfical to ttie wrttten sequence llsfing has been 
fumJshed 



2. 

a 



I I Certain claims were found uneearchaiale (See Box I). 
|~| Unity of invention is lacldng (see Box II). 

With regard to file title, 

I I file text Is approved as submitted by fiie applicant. 

Pn file text has been established by this Aufiiority to read as follows: 
GENE AND PROTEIN LIBRATIES AND METHODS RELATING THERETO 



with regard to ttie alMtract, 

pn file text Is approved as submitted by the applicant 

I I file text has been established, acoording to Rule 38.2(b). by fills Authority as It appears In Box III. The applicant may, 
■ — ' wtthtn one month from fiie date of maDtng of fills Intemafional search report submit comments to fills Authority. 

The figure of ttie dratwings to be published wflh ttie abstract Is Rgure No. ^ 



I I as suggested by the appflcant Q Mone of the figures. 

I I because file applicant failed to suggest a figure. 

I I because this figure t>ettar characterizes ttie Invenfion. 
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(57) Abstract 



A set of libraries of proteins, which proteins are capable of specific binding interactions by virtue of amino acid residues at two or 
more detennined positions including a first determined position and one or more other determined positions, which set of libraries consists 
of: a) 6 to 20 libraries in which each library has one or several but less than 20 amino acid residues at the said first determined position 
and is randomised at the said one or more other determined positions, the arrangement being such that interaction of the 6 to 20 libraries 
with a specific binding partner identifies an amino acid residue at the said first determined position that takes part in the specific binding 
interaction, and b) 6 to 20 libraries of corresponding design for each of the said one or more other determined positions. A set of libraries 
of genes which code for the proteins. A method of identifying a protein which interacts with a specific binding partner, which method 
comprises incubating the protein with each library of the set of libraries of proteins, observing specific binding interactions with certain 
libraries of the set, and using the observations to identify a protein which interacts with the specific binding partner. A method of making 
a library of randomised genes. 
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GENE AND PROTEIN LIBRATIES AND METHODS RELATING THERETO 
5 Introduction 

Naturally occurring proteins are capable of specific binding 
interactions with other proteins and other molecules. It is well known that 
such proteins can be used as scaffolds and specific amino acid residues 
changed in order to improve binding properties. The changes required can 

10 be determined by combinatorial chemistry means. The subject is reviewed 
by per-Ake Nygren and Mathias Uhlen in Curr. Opin. Struct. Biol. (1997) 7, 
463-469, who list cyclic peptides, immunoglobulin-like scaffolds, bacterial 
receptors, DNA-binding proteins and protease inhibitors as examples of 
protein scaffolds. The authors conclude that, starting from a suitable 

15 protein domain, the use of a combinatorial approach coupled with powerful 
selection or screening strategies can be used to obtain novel proteins 
capable of binding a desired target molecule. But the selection or 
screening strategies can be difficult. It is this problem that is addressed by 
the present invention, 

20 Zinc fingers are examples of protein scaffolds of the kind 

described. Zinc fingers are protein motifs ("mini-domains") which interact 
with double-stranded DNA (some also bind RNA). This interaction is 
dependent on DNA sequence, thus the interaction is termed to be 
sequence-specific. The interaction between the zinc finger and its target 

25 DNA sequence is modular: one zinc finger recognises three bases of DNA. 
Basic rules concerning the interaction were determined early on by 
structural studies (both X-ray crystallography and NMR spectroscopy) of 
zinc finger-DNA complexes. In essence, three residues (amino acids) 
within the zinc finger make base-specific contacts with the DNA. These 

30 three residues differ greatly between different zinc fingers, allowing a 
limited repertoire of different DNA sequences to be recognised. Early 



V 
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mutagenesis experiments determined that if these variable residues are 
changed, a different DNA sequence may be recognised. (A fourth residue 
sometimes contributes to DNA recognition, but this residue is well- 
conserved between different zinc finger proteins). In practice then, the zinc 
5 finger may be viewed as a molecular scaffold, which orientates the three 
variable residues suitably to enable them to make base-specific contacts 
with the DNA. 

It would be most advantageous to have available a zinc finger 
to bind each trinucleotide (3 bases) of dsDNA. Initial attempts to achieve 

10 this goal centred on the structure-based design of novel zinc finger 
proteins. Since 1994 however, several groups have employed 
combinatorial libraries of zinc finger proteins and/or target DNA sequences 
to identify novel zinc fingers which bind to the required DNA sequences 
One such technique has been developed by Choo and Klug 

15 and is described in WO 96/06166 and in PNAS, 91, 11163-11167 and 

1 1 168-1 1 172 (1994). A single library of zinc finger genes was constructed. 
The library was based on a naturally occurring zinc finger protein, Zif 268, 
which contains three zinc fingers. Only the central finger was randomised 
at seven positions. The library of genes was cloned as a fusion to the fd 

20 phage gene pill. When expressed, a library of bacteriophage resulted, in 
which each bacteriophage displayed a randomised zinc finger protein on its 
surface. In a first stage assay, this library was incubated with a target DNA 
molecule, and individual clones that bound to the target were purified and 
sequenced. In a second stage assay, each of those clones selected was 

2.^ incubated with a variety of related DNA sequences in order to further 
investigate its binding properties. The technique is subject to some 
inherent disadvantages: 

• Deconvolution is not addressed - purification is inherent in 

the method. The assay results in a pool of a bacteriophage. For 
30 identification purposes, each member of that pool must be cultured 
independently and its DNA sequenced. 
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• The experimental end point is determined empirically. While 
the assay is in progress, it is impossible to determine the number of 
different phage binding to the target DNA. The end point is therefore 
determined empirically e.g. by 15 washes. Any zinc finger which binds to 

5 the target DNA with sufficient strength to withstand these washes is 

selected, and a pool of zinc fingers results. There is no in-built mechanism 
to determine relative binding strengths of zinc fingers within this selected 
pool; hence the need for a second stage assay. 

• Library size. Constructing a library of the size required is 

10 technically difficult - indeed, the authors largest library is 200 times smaller 
than that theoretically required. When expressed therefore, several zinc 
finger proteins may be omitted. 

The present invention addresses these shortcomings. 
Zinc fingers are small protein motifs. They form parts of 
15 larger proteins, but perform their specific function within those proteins. 
Zinc fingers exist in tandem arrays: proteins containing between 2 and 37 
different zinc fingers have been identified. 

In two dimensions, a single zinc finger appears as follows: 




20 

In this diagram, each circle represents a single amino acid 
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residue. 

The zinc finger is so stable that its structure is unaffected by 
the replacement of virtually all residues marked "X" with alanine (Michael 
et al, PNAS 89, 4796-4800, 1992). Spaced correctly (as above) the 
5 following requirements are all that are necessary for the formation of a zinc 
finger: 

• The 2 cysteine (C) residues 

• The 2 histidine (H) residues 

• The zinc ion (Zn), which is co-ordinated (bound) by the C and 
H) H residues 

• Three hydrophobic residues: tyrosine/phenylalanine (Y/F); 
phenylalanine (F4); leucine (L10)- 

Zinc fingers bind to nucleic acids - either DNA or RNA. In 
nature, zinc fingers usually form part of transcription factors, but in the 

!5 laboratory, it is possible to work with them independently from the rest of 
these proteins. The zinc finger exemplified herein binds to double-stranded 
DNA. One zinc finger binds to three bases of DNA (a trinucleotide). 

Several zinc fingers are usually linked in tandem. Most 
frequently, three zinc fingers interact with successive trinucleotides, which 

20 means that altogether, the three zinc fingers will interact with (recognise) a 
specific 9 base pair (bp) sequence of DNA. Each zinc finger will recognise 
a specific trinucleotide. However, nature has only provided a limited 
repertoire of zinc fingers, so the number of 9 base pair sequences which 
can be recognised is very limited. 

23 The mechanism of DNA recognition is sequence-specific and 

surprisingly simple. Three residues (amino acids) within the zinc finger 
make contacts (hydrogen bonds or Van de Waal's interactions, for 
example) with three bases of DNA. Most of these contacts are with one 
strand of the DNA. 
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Many experiments have shown that if the three interacting 
residues (here named a, p and y) are changed, the resulting zinc finger will 
5 recognise a different sequence of DNA. Moreover, if a library of zinc finger 
proteins is made in which a, p and yare randomised, new zinc finger 
proteins may be identified by screening the library with a specific sequence 
of DNA. 

There are 64 possible trinucleotides: 

10 

Number of trinucleotides NNN =4x4x4 =64 

I 

(A,C,G or T) 



15 Therefore, 64 different zinc finger proteins, each of which 

binds optimally to one trinucleotide would represent: a complete zinc finger 
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code. A problem (addressed by this invention) is to develop such a code. 

This invention involves applying the principles of 
combinatorial chemistry to the problem. The key to any combinatorial 
system (whether biological, chemical or any other system) is deconvolution: 
the identification of an active substituent from within a mixture. The key to 
discovering an optimal zinc finger for each trinucleotide is to identify the 
optimum combinations of residues a, p and y. There will be an optimum 
combination of a, p and yfor each trinucleotide. By using multiple libraries 
of zinc fingers, with highly controlled overlap between the libraries, 
deconvolution can be achieved without purification. 

The Invention 

In one aspect the invention provides a set of libraries of 
genes which code for proteins which are capable of specific binding 
13 interactions by virtue of amino acid residues at two or more determined 
positions including a first determined position and one or more other 
determined positions, which set of libraries consists of: 

a) 6 to 20 libraries in which each library has a triplet that codes 
for one or several but less than 20 amino acids at the said first determined 

20 position, and is randomised at the triplet or triplets coding for the said one 
or more other determined positions, the arrangement being such that 
interactions of the proteins coded for by the said 6 to 20 libraries with a 
specific binding partner identifies a triplet that codes for an amino acid at 
the said first determined position that takes part in the specific binding 

25 interaction, and 

b) 6 to 20 libraries of corresponding design for each of the said 
one or more other determined positions. 

In another aspect the invention provides a method of 
constructing randomised gene libraries in which the number of genes is the 
30 same as the number of encoded proteins and which contain no termination 
codons at the predetermined positions of randomisation, the method 
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comprising the steps of: 

a) providing a template oligonucleotide which is fully 
randomised at one or more predetermined codon positions; 

b) for each predetermined codon position providing a pool of 
5 selection oligonucleotides, wherein each member of said pool contains a 

different codon selected from the group consisting of 

AAA, AAC, ACC, AGC, ATG, ATT, GAG, CAT, CCG, CGC, CTG, GAA, 
GAT, GOG, GGC, GTG, TAT, TGG, TGC, TTT. 

10 

at the predetermined codon position; 

c) selecting one or more selection oligonucleotides from each 
pool in order to encode the required gene or library; 

d) allowing the selected selection oligonucleotides from each 
15 pool to hybridise with the template oligonucleotide; 

e) forming one or more constructs by tigating the hybridised 
selection oligonucleotides together; 

f) removing a region from a gene of interest corresponding to 
the hybridised product from step e); 

20 g) forming a gene or library of genes by ligating the products 

from step e) into the said gene of interest wherein the said gene of interest 
is contained within a suitable expression vector. A preferred method of 
selecting one or more selection oligonucleotides from each pool in order to 
encode the required gene or library at step c), is to select the selection 

25 oligonucleotides according to randomisation strategy B, described herein. 
A method of producing proteins encoded by these randomised gene 
libraries is also provided by the invention and comprises the steps of: 

a) transforming a suitable host cell with a gene or gene library 

construct; 

30 b) expressing the genes to form proteins; 

c) purifying the proteins. 
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Suitable host cells, gene expression methods and purification protocols for 
carrying out this method are known in the art. 

In another aspect the invention provides a set of libraries of 
proteins, which proteins are capable of specific binding interactions by 
5 virtue of amino acid residues at two or more determined positions including 
a first determined position and one or more other determined positions, 
which set of libraries consists of: 

a) 6 to 20 libraries in which each library has one or several but 
less than 20 amino acid residues at the said first determined position and is 

10 randomised at the said one or more other determined positions, the 
arrangement being such that interaction of the 6 to 20 libraries with a 
specific binding partner identifies an amino acid residue at the said first 
determined position that takes part in the specific binding interaction, and 

b) 6 to 20 libraries of corresponding design for each of the said 
15 one or more other determined positions. 

In another aspect the invention provides a method of 
identifying a protein which interacts with a specific binding partner, which 
method comprises providing a set of libraries of proteins as defined, 
incubating the specific binding partner with each library of the set, 

20 observing specific binding interactions with certain libraries of the set, and 
using the observations to identify a protein which interacts with the specific 
binding partner. Preferably, as discussed in more detail below, this method 
may be performed using radiometric or non-radiometric detection means, 
for example scintillation detection, luminescence, for example fluorescence, 

25 detection, colorimetric detection, or imaging, by methods known in the art. 

A library of compounds (e.g. genes or proteins) consists of a 
plurality of compounds which are all different but which have some 
characteristic in common. The compounds of the library may be presented 
either separate or together, in solution or solid phase. In a set of libraries, 

30 the compounds of any one library have some characteristic in common but 
which differentiates them from the compound of each other library of the 
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set. 

A specific binding interaction of a protein with another 
molecule (the specific binding partner) is an interaction mediated by a 
specified amino acid residue at one or more usually several positions in the 
5 protein molecule. The specific binding partner is usually though not 
necessarily a polymeric molecule, e.g. a nucleic acid (DNA or RNA) or 
another protein. 

In relation to proteins, the statement that a library is 
randomised at a determined position is herein used to mean that the library 
10 contains a random mixture of all or almost all possible amino acid residues. 
We say "almost all" because there might be a special reason for omitting 
one residue e.g. Cys, or a few amino acid residues. In relation to genes, 
the statement that a triplet is randomised is herein used to indicate a triplet 
NNN (where N is any nucleotide) or a triplet that is capable of coding for all 
15 or almost all the amino acids. 

The term protein is herein used to encompass any chain of 
two or more amino acid residues. 

The term polynucleotide is herein used to encompass any 
chain of three or more nucleotide residues, single-stranded or double- 

20 stranded DNA or RNA. 

The experimental section below describes a set of libraries of 
zinc finger genes which code for a set of libraries of zinc finger proteins, 
which are used to identify specific zinc fingers which interact with specific 
polynucleotides. But the invention is more broadly applicable. It is in 

23 principle possible to make a set of libraries of any protein which undergoes 
a specific binding interaction, using that protein as a scaffold to vary 
specific amino acid residues. It is in principle possible to make a set of 
libraries of genes coding for such a set of protein libraries. And it is 
possible to use such a set of protein libraries to investigate any specific 

30 binding interaction, e.g. where the specific binding partner is a 

polynucleotide or another protein or a different molecule. It may be noted 
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that zinc fingers may be capable of undergoing specific binding 
interactions, not only with polynucleotides, but also with other proteins. 

It is convenient to control the overlap between libraries of a 
set of protein libraries by controlling the DNA sequences of the genes 
5 which code for the proteins. Thus, to make a library of zinc finger proteins, 
a library of zinc finger genes is first made. For convenience in relation to 
what follows we quote the genetic code which relates the identities of 
codons to the amino acids which they specify. 



2nd base 

A 



1 St base < 



G 



A 


c 


G 


T 


Lys 
Asn 
Lys 
Asn 


Thr 
Thr 
Thr 
Thr 


Arg 
Ser 
Arg 
Ser 


He 
He 
Met 
He 


Gin 
His 
Gin 
His 


Pro 
Pro 
Pro 
Pro 


Arg 
Arg 
Arg 
Arg 


Leu 
Leu 
Leu 
Leu 


Glu 
Asp 
Glu 

Asp 


Ala 
Ala 
Ala 
Ala 


Gly 
Gly 
Gly 
Gly 


Val 
Val 
Val 
Val 


STOP 
Tyr 
STOP 
Tyr 


Ser 
Ser 
Ser 
Ser 


STOP 
Cys 
Trp 
Cys 


Leu 
Phe 
Leu 
Phe 



A 
C 
G 
T 

A 
C 
G 
T 

A 
C 
G 
T 

A 
C 
G 



^ 3rd base 



Thus for example a codon with multiple degeneracy, e.g. 
ANN comprises 16 different triplets and codes for seven different amino 
acids namely Lys, Asn, Thr, Arg, Ser, lie and Met. 

15 While it is possible in principle to use as few as six libraries of 

genes to identify a particular amino acid residue, it is in practice convenient 
to use twelve such libraries in groups of four, wherein libraries 1 to 4 
identify the first nucleotide of a triplet, libraries 5 to 8 identify the second 
nucleotide of the triplet, and libraries 9 to 12 identify the third nucleotide of 

20 the triplet which codes for the amino acid. In this arrangement it is 

preferable that only one of libraries 1 to 4 (and correspondingly only one of 
libraries 5 to 8 and only one of libraries 9 to 12) codes for any particular 
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amino acid. These considerations give rise to various possible sets of 12 
libraries of which one is shown in the following Table 1 . 

Table 1 



Library 


Residue 


Codon 


Amino Acids Specified 


1 


a 


A N 


Lys Asn Thr lie Met 


2 


a 


C N 


Gin His Pro Arg 


3 


a 


G N N 


Au Asp Ala Gly Val 


4 


a 


T N N 


Tyr Ser Cys Trp Leu Phe 


5 


a 


NAN 


Lys Asn Gin His Glu Asp Tyr 


6 


a 


N C N 


Thr Pro Ala Ser 


7 


a 


^Gj G N 


Arg Gly Cys Trp 


8 


a 


^CtT N 


lie Met Leu Val Phe 


9 


a 


^Cg ^Cj G 


Lys Thr Met Gin Pro Leu Glu Ala Val 


10 


a 


TGG 


Trp 


11 


a 


N 


Asn Ser His Arg Asp Gly Tyr Cys 


12 


a 


\TC 


lie Phe 



Note that any given amino acid appears only once in any set 

of 4 libraries. 

Similar randomisation can now be applied to all three 
10 positions: a, (3 and y of zinc finger proteins, to generate libraries 1-36. In 
libraries 1-12, the randomisation of residue a is controlled (in these 
libraries, residues p and yare fully randomised - they are specified by the 
codon NNN). Similarly, libraries 13-24 control the randomisation of position 
p, and libraries 25-36 control the randomisation of residue y). 
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All 36 gene libraries are expressed to generate zinc finger 
libraries. These zinc finger libraries are then incubated with a 
polynucleotide of interest, in such a way as to identify one library from each 
group of four that binds most strongly to the polynucleotide. For example, 
each library may be placed in an individual well of a microtitre plate and 
there incubated with the same trinucleotide. 

Consider the controlled randomisation of residue a. Because in 
any one group of 4 libraries each amino acid is encoded only once, each amino 
acid, as residue a, will occur in only three of the twelve libraries: 



wo 00/15777 




PCT/GB99/03081 




wo 00/15777 



- 15- 



PCT/GB99/03081 



Presence / absence of an amino acid at position a within any 
given library is a direct result of the controlled randomisation and the 
genetic code. 

This may now be applied to the assay. Consider that libraries 
5 1-12 only are screened with the trinucleotide ATG and that in order for a 
zinc finger to bind ATG, residue a must be Lys (lysine). An assay of 
libraries 1-12 is performed: 



Library 1 2 3 4 6 7 8 9 ]Q 11 1 2 



10 



15 



20 



•OOO^OO 

ooooooo 
ooooooo 
ooooooo 
ooooooo 
ooooooo 
ooooooo 
ooooooo 



o#ooo 

ooooo 

OOOOO 

ooooo 
ooooo 
ooooo 
ooooo 
ooooo 



Position a 



ACG TACGTGGCC 



Ni 



Nil 



Niii 



Fixed nucleoiide 

Posiiion of" fixed nucleotide within codon 



Only libraries 1 , 5 and 9 contain lysine as residue a, therefore 
only these libraries can emit light. None of the other libraries can emit 
light, because none of them specify lysine as residue a. However, this is 
not the limit of our knowledge. We know the identity of the fixed nucleotide 
within each library. Moreover, we can read this off directly from the 
microtitre plate. In this case, the order of fixed nucleotides is AAG. 

Thus, simply from the unique combination of libraries which 
emit light, we know the genetic code for the amino acid required as residue 
a. In this case, the essential fixed nucleotides are AAG, which specifies 
lysine. We have now linked the genetic code directly to the physical 
properties of a protein. 

This principle may be applied to all 36 libraries. In so doing, 
the genetic codes and thus required identities of all three residues a, (3 and 
7 will be determined: 
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This is possible, because in libraries 1-12, residues [3 and y 
are fully randonnised. Therefore, in each of libraries 1-12 Ser and Arg are 
present as residues p and y within the mixture. 

Similarly, when controlled randomisation is applied to residue 
5 (3 (libraries 13-24) residues a and y are fully randomised and when 
controlled randomisation is applied to residue y, residues a, p are fully 
randomised. 

By screening the 36 libraries with each of the 64 
trinucleotides, an optimum zinc finger will be found for each trinucleotide. 
10 Thus the result is therefore the solution of the zinc finger code whereby 
DNA binding proteins may now be designed at will. 

Should more than three libraries within a given set of twelve 
produce a signal, then the plates may be washed to remove signals 
resulting from weak interactions. An end point to the assay has been 
\5 reached when just three libraries per set of twelve generate a signal. 

The above strategy generates libraries of genes which when 
expressed, yield protein libraries in which two positions are fully 
randomised and one position has controlled randomisation. In practice, 
this leads to libraries with between 400 (e.g. library 10) and 3600 (eg. 
20 library 9) constituent proteins. These numbers are calculated as follows: 

Number of library constituents = multiplication of number oi possibilities at each 

position of randomisation 

25 eg. library 1 : = position a x position p x position y 

5 X 20 X 20 

2000 con stitiJBnts (proteins) 



30 



However, these small libraries result from the degeneracy of 
the genetic code. In practice, the gene libraries which encode the proteins, 
randomised as above, will be far larger. For example, again consider 
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library 1: 

Codon a p 7 

Sequence A^CyN NNN NNN 

5 

Numbers 1x3x4 x 4x4x4 x 4x4x4 = 49152 c onstituen ts (aenesl 

The generation of such libraries should not be problematic 
technically, since libraries far larger than these exist already (eg. Choo and 
10 Klug, 1994, PNAS 91 , 1 1 163-7). However, it may it may prove beneficial to 
reduce the gene library sizes to those of the protein libraries. Potential 
benefits include: 

• greater likelihood of full representation within each library (all 
constituent proteins encoded); 

15 • even representation of each constituent (an equal amount of 

each constituent protein within a given library); 

• consistent optimum codon usage (to maximise expression). 
These attributes are desirable because of the degeneracy of 

the genetic code. Again consider library 1 . Within this library, position p is 
20 encoded by NNN. When expressed therefore, residue p is 6 times more 

likely to be serine than it is to be methionine, because serine is encoded six 
times within NNN for each encoding of methionine. 

Such bias within libraries may have an adverse effect on the 
results of the assay. Any detrimental effect is predicted to be minor - it 
25 should occur only if two proteins have similar binding affinities with a given 
DNA sequence. However, such an eventuality is possible: consider that 
two zinc fingers with positions a-Arg, p=Ser, 7=l-ys and a=Arg, p=Met, 
y=Lys bind similarly to a given sequence of DNA, with a=Arg, p=Met, 7=Lys 
being the optimally binding zinc finger protein. During the assay, the 
30 effective concentration of the protein containing serine at position p would 
be greater than that of the protein containing methionine. Thus, the serine- 
containing protein might give a stronger signal even though it is not the 
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optimum zinc finger for that DNA sequence. 

It may therefore be preferred to substitute the codon MAX for 
positions of full randomisation (previously NNN), where MAX is a mixture 
5 containing only the following codons: 

AAA, AAC. ACC, AGO, ATG. ATT, CAG, CAT, COG, CGC, CTG. GAA, GAT, GCG, GGC, GTG, 
TAT. TGG, TGC, TTT. 

10 These codons represents those most favoured by E. coli for 

each amino acid (Nakamura et al., (1997), Nucleic Acids Research, 25, 
244-245). 

In order to employ these codons in controlled randomisation, 
a new division of the codons into sets of 12 libraries is required, as outlined 
15 in randomisation strategy B: 
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The changes in controlled randomisation will affect the library 
numbers which produce a signal and therefore the interpretation of the 
assay results. However, the principles of controlled randomisation and the 
mechanism of assay interpretation remain unchanged. Using 
randomisation strategy B, the example illustrated above is reiterated: 
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Randomisation strategy A is in principle, the easier strategy to 
implement technically. However, strategy B is preferred. Gene libraries of 
much smaller size are required. Although construction of these highly- 
controlled libraries is technically demanding, it is much more likely that the 
5 libraries encode all required proteins and moreover that those proteins are 
encoded in similar proportions, so removing potential difficulties in the SPA 
library assays. 

Construction of these gene libraries may be achieved by 
cloning oligonucleotide cassettes between two appropriately positioned 
10 restriction sites which flank positions a and y. Construction of the 
oligonucleotide cassettes requires a set of sixty-one oligonucleotides 
comprising one fully-randomised "template" oligonucleotide and three pools 
of selection oligonucleotides. The template oligonucleotide is of sequence 

,5 3» NNN NNN NNN 5' 



where " " represents the invariant DNA and NNN the positions of 

randomisation within the non-coding strand of the gene. The intervening 

sequences " " are conveniently between 3 and 21 bases in length. 

20 The pools of selection oligonucleotides contain twenty 

individual oligonucleotides of sequence 





Lys: 


5' 


AAA— 


—3' 




Asn: 


5> 


AAC— 


—3' 


25 


Thr: 


5' 


ACC— 


—3' 




Ser: 


5- 


AGC— 


—3' 




Met: 


5' 


ATG— 


—3' 




lie: 


5- 


ATT-- 


—3' 




Gin: 


5- 


CAG— 


— 3' 


30 


His: 


5> 


CAT— 


—3' 




Pro: 


5- 


CCG-- 


— 3' 
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Arg: 


5'- 


Leu: 


5'- 


Glu: 


5'- 


Asp: 


5'- 


Ala: 


5'- 


Gly: 


5'- 


Val: 


5'- 


Tyr: 


5'- 


Trp: 


5'- 


Cys: 


5' 


Phe: 


5' 



CGC 3' 

CTG 3' 

GAA 3' 

GAT 3* 

GCG 3' 

GGC 3' 

GTG 3' 

TAT 3' 

TGG 3' 

TGC 3' 

■TTT 3' 



where the sequence " " is of suitable length and base sequence to 

base pair with the non-variant regions of the template and the defined 

15 codon corresponds to one of those comprising the "MAX" set of codons 
(defined herein at page 18, line 5). The defined codon corresponds to a 
position of randomisation and must be either at or near to one end of the 
oligonucleotide. A complete selection pool represents a set of twenty such 
oligonucleotides, in order that all codons contained within "MAX" are 

20 represented and all twenty amino acids are encoded. 

The invention enables fully randomised libraries, positionally 
fixed libraries and individual genes to be constructed. Oligonucleotides 
encoding the required amino acid at each position of randomisation would 
be taken from each selection pool. For example, if full randomisation is 

25 required at a given position, then all 20 selection oligonucleotides would be 
taken. If positional fixing were required, then all oligonucleotides where the 
"MAX" codon begins with A (for example) would be taken. If a single amino 
acid were required at the position of randomisation, the single selection 
oligonucleotide corresponding to that amino acid would be taken. 
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Construction of a single zinc finger gene encoding a=Lys, p=Ser, 
T^Arg 

The selection oligonucleotides p-Ser and y-Arg are treated 
with T4 polynucleotide kinase and ATP in order to attach 5' phosphate 
5 groups and so enable them to participate in ligation reactions. These two 
oligonucleotides, together with the selection oligonucleotide a-Lys and the 
template oligonucleotide are combined, heated to 90» C and allowed to 
cool slowly to room temperature, in order to allow complementary 
sequences of DNA to base pair as shown below: 



5' 




a-Lys- 



p-Ser- 



y-Arg- 



Selection oligonucleotides from pools a, p and y 



AAAiJnHiniiiHiiii AGC ni j iCGQl 

, NNN NNN NNN« 



Template (one fully-randomised oligonucleotide) 




1/2 

restriction 
site 



KEY: 

Invariant DNA sequence within pool a 

niiuin Invariant DNA sequence within pool p 
I I Invariant DNA sequence within pool y 

Invariant DNA sequence of the template oligonucleotide 

10 
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The resulting oligonucleotide cassette is then inserted into the appropriate 
restriction sites in the zinc finger gene, so generating the zinc finger gene 
a=Lys, p=Ser, T^Arg. None of the other sequences contained in the 
template oligonucleotide are cloned, since only the double stranded DNA 
5 cassette will be ligated into the parental gene. Selection from the template 
oligonucleotide is thus achieved by addition of the three selection 
oligonucleotides. 



Construction of zinc finger library 1 

10 The selection oligonucleotides p-MAX and y-MAX (where 

MAX = an entire selection pool) are treated with T4 polynucleotide kinase 
and ATP in order to attach 5' phosphate groups and so enable them to 
participate in ligation reactions. These two oligonucleotide pools, together 
with the selection oligonucleotide a-MlX 1 where MIX 1 is the following 

15 mixture of oligonucleotides: 



20 



a-Lys: 
a-Asn: 
a-Thr: 
a-Ser: 
a-Met: 
a-lle: 



5- 
5' 
5' 
5' 
5' 
5' 



AAA 3* 

AAC 3' 

ACC 3* 

AGC 3' 

ATG 3' 

ATT 3' 



and the template oligonucleotide are combined, heated to 90» C and 
25 allowed to cool slowly to room temperature, in order to allow 

complementary sequences of DNA to base pair as above. 

The resulting mixture of oligonucleotide cassettes is then 

inserted into the appropriate restriction sites in the zinc finger gene, so 

generating the zinc finger library 1 . None of the other sequences contained 
30 in the template oligonucleotide are cloned, since only the double stranded 

DNA cassettes will be ligated into the parental gene. Selection from the 
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template oligonucleotide is thus achieved by addition of the three pools of 
selection oligonucleotides. Note that the number of genes exactly matches 
the number of encoded proteins and that no truncated proteins should 
result, since "MAX" contains no termination codons. 

5 

Generalised application to randomised peptides 

The above technique may also be used to generate genes 
encoding fully randomised peptides, without intervening conserved gene 
sequences. Again, the number of genes will exactly match the number of 

10 encoded peptides. In the case of a fully randomised peptide library without 
positional fixing, just 21 oligonucleotides are required: a fully-randomised 
template oligonucleotide of the desired length and a set of the twenty 
"MAX" trinucleotides. Annealing between the set of "MAX" trinucleotides 
and the template will generate cassettes encoding all possible peptides, 

!5 dependent on complete representation within the template oligonucleotide, 
which will decrease with oligonucleotide length. 

Positionally fixed, random peptides may be made similarly, 
although a set of twelve templates will be required for each codon. Here, 
for a given codon, the non-coding template strand will be fixed alternatively 

20 as T, G, C and A at each nucleotide and the "MAX" trinucleotides annealed 
as above. 

a) The above strategies A and B involve designing sets of 

libraries of genes which in turn may be expressed to generate 
corresponding libraries of proteins. 

25 The method of the invention involves incubating a set of 

libraries of proteins with a specific binding partner, observing specific 
binding interactions with certain libraries of the set, and using the 
observations to identify a protein which interacts with the specific binding 
partner. Although other assay techniques are possible, this method is 

30 preferably performed using scintillation proximity assay (SPA) technology. 
Briefly, this technology involves providing a support which comprises a 
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scintillant which emits light when subjected to electrons (e.g. p particles) or 
other forms of radiation resulting from decomposition of a radioisotope. 
The support may be massive, e.g. the base of each well of a microtitre 
plate, or may be particulate. One assay reagent is immobilised on the 

5 support. Another assay reagent is radiolabelled and is partitioned between 
two fractions, one bound to the support and the other free in solution. The 
relative size of the two fractions is arranged to be related to the presence or 
the concentration of an analyte of interest. The radioisotope is chosen 
such that reagent bound to the support causes the scintillant in the support 

10 to emit light, while reagent free in solution does not (on account of the short 
mean free path of the radiation) significantly affect the scintillant substance. 

Various assay formats are possible. For example, each 
library of a set of libraries can be immobilised in an individual well, either of 
a standard microtitre plate or of a scintillant containing microtitre plate. A 

15 specific binding partner of the proteins is labelled and introduced into each 
well. Labels can be radiometric, luminescent, for example fluorescent or 
may be enzyme. Where radiometric of luminescent labels are used, a 
specific binding interaction can be investigated in real time. Where enzyme 
labels are used the interaction can be investigated upon the addition of the 

20 appropriate reagents needed to generate a signal. Where several wells 
emit a signal, repeated washing can be used to remove weakly interacting 
species until the specific binding partner remains bound only in a single 
well. This ability to identify a single library (as opposed to a small pool of 
libraries) that bind most strongly to any particular specific binding partner, is 

25 a valuable feature, and an advance on assay techniques used previously 
for similar purposes. 

Alternatively, the specific binding partner can be immobilised 
in each well of the SPA microtitre plate. Each protein library is 
radiolabelled and introduced into a different well of the plate for interaction 

30 with the specific binding partner. Alternative assay formats, in which 

neither the protein library nor its specific binding partner, but rather a third 
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reagent is radiolabelled, are well known in the art. 

Techniques for immobilising protein or other assay reagents 
on SPA surfaces in forms suitable for taking part in SPA assays, are well 
known in the art. Development of suitable techniques should not amount to 

5 more than the routine optimisation ordinarily required for assays of this 
kind. Detection of interactions by non-radioactive assay and imaging 
techniques such as luminescent, for example fluorescent, detection or 
colorimetric detection of interactions between, for example, biotin linked 
and streptavidin linked partners is also envisaged. 

10 Most zinc finger proteins form the DNA recognition module of 

transcription factors, which serve to switch genes on or off. Already, 
several examples exist where novel transcription factors have been 
engineered, by changing their zinc fingers (Choo efa/(1994), Nature 372. 
642-5). Similarly, zinc fingers have been linked to restriction endonuclease 

15 cleavage domains, to generate novel restriction endonucleases (e.g. Kim et 
a/ (1996), PNAS 93, 1156-60). The application of zinc fingers is almost 
limitless - when ever a need arises to link something to a specific sequence 
of DNA, it can be met with a series of zinc fingers. However, in order to 
design DMA-binding proteins at will, there must be available one zinc finger 

20 for each trinucleotide. This invention provides enabling technology to 
achieve that object. 



Example 

The example involves a single protein, comprising three zinc 
25 fingers. Controlled randomisation is applied only to the central zinc finger. 
The two outer zinc fingers are present simply to ensure correct registry with 
the target DNA sequence and to increase overall binding strength (Choo 
andKlug, (1994) PNAS 01, 11163-67; Berg (1997) Nature Biotech. 15, 
323). 

30 The work is divided into four stages: gene synthesis, gene 

expression, radiometric and colorimetric assay formats, assay results and 



wo 00/15777 




PCT/GB99/03081 



proof of principle. 

Gene Synthesis: 

A gene was designed and synthesised to encode the protein 
5 (SEQ ID NO: 1) 

T G E K P YK£PE£GKSFSKKSHLVy\HQRTH 

T G E K P YKCPECGKSFSKKSHLV>!\HQRTH 

10 

T G E K P YKCPECGKSFSKKSHLVAHQRTH. 

Key: 

X linker residues 

15 X zinc co-ordinating residues 

X DNA-contacting residues (a, p and y) (positions -1 , +3 and +6) 

This protein corresponds to three repeats of Berg^s 
20 consensus zinc finger sequence (Krizek ef a/., (1991) JACS 113, 4518-23), 
with DNA-contacting residues from the first zinc finger of transcription 
factor Sp1 (Berg (1992) PNAS 89, 11109-10; Shi and Berg, (1995) Chem 
& Biol. 2, 83-89). Each zinc finger sequence is preceded by a Kruppel-iype 
linker peptide (Choo and Klug (1993) NAR 21, 3341-6). By analogy to 
25 previous precedent (Shi and Berg, 1995), the three repeats of this novel 
zinc finger peptide are expected to bind to the dsDNA sequence 
5'-GGG GGG GGG-3\ 

To maxinnise gene expression, on converting the sequence 
into DNA, E. coli codon preference was employed (Wada et ai (1992) 
30 NAR20 sup., 21 11-8). Wherever possible, first preference codons were 
used. However, in some instances, second preference codons were also 
employed. These limited sequence repetition within the gene, necessary to 
prevent potential intragenic recombination events, which would be 
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deleterious to ensuing experiments. In practice, a maximum repeat length 
of 8 base pairs was mostly achieved. Use of second preference codons 
also allowed the incorporation of restriction enzyme sites within the gene. 
The final gene sequence, restriction sites and codon usage are illustrated 
5 in Figure 1 . 

Gene Expression 

In the current assay format, the zinc finger gene is fused to 
the glutathione-S-transferase gene in the vector pGEX2TK (Amersham 

10 Pharmacia Biotech). Expression of this construct leads to a 36.5 kD 

protein comprising GST at the amino terminus and the zinc finger protein at 
the carboxyl terminus. Gene expression is performed in E. co// BL21 cells 
according to manufacturer's instructions. The resulting fusion protein is 
then purified using glutathione-Sepharose (Amersham Pharmacia Biotech) 

15 according to manufacturer's instructions. Use of the pGEX2TK vector 
allows for the subsequent radiolabelling of the protein if required. 

Assay formats for assessing zinc finger - DNA interactions 

20 Direct attachment of GST fusion protein to microtitre plates, followed by 
colorimetric detection of biotinylated DNA (Assay format 1) 

GST or GST ZF protein (4 pmoles per well) was immobilised 
in microtitre wells in carbonate buffer, pH 9.2, for 18 hrs. The plates were 
washed three times in TBS-Tween (0.3% Tween) and then blocked in the 

25 same buffer for 3 hrs. After washing, 2-foid serial dilutions of DNA were 
added to each well. The protein and DNA were incubated together for 2 
hrs at room temperature, and the wells were then washed 3 times in TBS- 
Tween. As negative controls, experiments were performed in the absence 
of DNA, to assess binding of GST / GST ZF proteins by the streptavidin 

30 conjugate. Bound DNA was detected by adding streptavidin / peroxidase 
conjugate, which was removed by 3 washes in TBS. Finally, the conjugate 
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was detected colorimetrically according to manufacturer's instructions. All 
reactions were performed in duplicate. Figure 1 demonstrates that 
interaction between the zinc finger protein and its target DNA sequence 
may be assessed using this assay format. In figures 1 , 2 and 3, the legend 
5 'bkg' denotes background detection levels. 

Direct attachment of GST fusion protein to microtitre plates, followed by 
scintillation-based detection of radiolabelled DNA (Assay format 2) 

GST or GST ZF protein (4 pmoles per well) was immobilised 

10 in microtitre wells in carbonate buffer, pH 9.2, for 18 hrs. The plates were 
washed three times in TBS-Tween (0.3% Tween) and then blocked in the 
same buffer for 3 hrs. After washing, 2-fold serial dilutions of radiolabelled 
DNA were added to each well. The protein and DNA were incubated 
together for 2 hrs at room temp, and the wells were then washed 3 times in 

15 TBS-Tween. Bound DNA was detected by scintillation counting. All 
reactions were performed in duplicate. Figure 2 demonstrates that 
interaction between the zinc finger protein and its target DNA sequence 
may be assessed using this assay format. 

20 Antibody-based attachment of GST fusion protein to microtitre plates, 
followed by scintillation-based detection of radiolabelled DNA (Assay 
format 3) 

One |jg of protein A was attached to the surface of each 
microtitre well in carbonate buffer, pH 9.2, for 18 hrs. The plates were 

25 washed three times in TBS-BSA (2% BSA) and then blocked in the same 
buffer for 3 hrs. Anti-GST antibody (1 pg) was added to each well in the 
same buffer and incubated at room temperature with rocking, for 1 hr. The 
plates were washed 3 times in TBS-BSA and then incubated for 1 hr with 4 
pmoles GST / GST ZF protein per well. After washing away unbound 

30 protein, the plates were incubated for 2 hrs at room temp with 2-fold serial 
dilutions of radiolabelled DNA. Unbound DNA was removed by 3 washes 
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in TBS-BSA. As negative controls, experiments were performed in the 
absence of antibody, to assess any binding of radiolabelled DNA by protein 
A. All reactions containing GST / GST ZF were performed in duplicate. 
Figure 3 demonstrates that interaction between the zinc finger protein and 
5 its target DNA sequence may be assessed using this assay format. 

Conclusion 

Three adsorption-based assay formats have been developed. 
All assay formats demonstrate interaction between the protein and its DNA 

10 target sequence. In each case, the protein is immobilised and the DNA is 
in solution. Labelled DNA is bound by the immobilised protein and then 
detected according to the nature of the label. Radiolabelled DNA is 
detected using scintillation-based methods or appropriate imaging 
technology. Non-radiometrically labelled DNA is detected using 

15 colorimetric techniques and a spectrophotometer. The assay formats are 
also applicable to fluorescently labelled DNA, where imaging technology 
would be used to detect the bound DNA. 
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CLAIMS 

5 1 . A set of libraries of genes which code for proteins which are 

capable of specific binding interactions by virtue of amino acid residues at 
two or more determined positions including a first determined position and 
one or more other determined positions, which set of libraries consists of: 

a) 6 to 20 libraries in which each library has a triplet that codes 
10 for one or several but less than 20 amino acids at the said first determined 

position, and is randomised at the triplet or triplets coding for the said one 
or more other determined positions, the arrangement being such that 
interactions of the" proteins coded for by the said 6 to 20 libraries with a 
specific binding partner identifies a triplet that codes for an amino acid at 
15 the said first determined position that takes part in the specific binding 
interaction, and 

b) 6 to 20 libraries of corresponding design for each of the said 
one or more other determined positions. 

20 2. The set of libraries of genes as claimed in claim 1 , which set 

of libraries consists of: 

a) 12 libraries in which each library has a triplet that codes for 

one or several but less than 20 amino acids at the said first determined 
position, the triplets being as shown in Table 1 or Table 2, and 
25 b) 12 libraries of corresponding design for each of the said one 

or more other determined positions. 

3. The set of libraries of genes as claimed in claim 1 or claim 2, 

wherein the genes code for zinc fingers. 

30 
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4. The set of libraries of genes as claimed in clainn 3, which set 

consists of 36 libraries in three groups of 12 libraries which code for amino 
acids at the -1 and +3 and +6 positions respectively. 

5 5. The set of libraries of genes as claimed in claim 3 or claim 4, 

wherein each gene codes for a protein comprising 3 zinc fingers. 

6. The set of libraries of genes as claimed in claim 5, wherein 
each gene codes for a protein having the sequence (SEQ ID NO: 2) 

10 

T G E K P YKQPEQGK S F SXKSXLVXHQRTB 
T G E K P YKCPECGKS FSXKSXLVXMQRTH 
15 T G E K P YKCPECGKSFSXKSXLVXHQRTH. 

where X is any amino acid 

7. A set of libraries of proteins, which proteins are capable of 

20 specific binding interactions by virtue of amino acid residues at two or more 
determined positions including a first determined position and one or more 
other determined positions, which set of libraries consists of: 

a) 6 to 20 libraries in which each library has one or several but 
less than 20 amino acid residues at the said first determined position and is 

25 randomised at the said one or more other determined positions, the 
arrangement being such that interaction of the 6 to 20 libraries with a 
specific binding partner identifies an amino acid residue at the said first 
determined position that takes part in the specific binding interaction, and 

b) 6 to 20 libraries of corresponding design for each of the said 
30 one or more other determined positions. 
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8. The set of libraries of proteins as claimed in claim 7, which 
set of libraries consists of 

a) 20 libraries in which each library has one specified amino acid 
residue at the said first determined position and is randomised at the said 

5 one or more other determined positions, and 

b) 20 libraries of corresponding design for each of the said one 
or more other determined positions. 

9. The set of libraries of proteins as claimed in claim 7 or claim 
10 8, wherein the proteins are zinc fingers. 

10. The set of libraries of proteins as claimed in claim 7, which 
set consists of 60 libraries in three groups of 20 libraries with specified 
amino acids at the -1 and +3 and +6 positions respectively. 

15 

1 1 . The set of libraries of proteins as claimed in claim 9 or claim 
10, wherein each protein comprises three zinc fingers. 

12. The set of libraries of proteins as claimed in claim 1 1 , wherein 
20 each protein as the sequence (SEQ ID NO: 2) 

T G E K P YKCPEQGKSFSXKSXLVXHQRTH 

T G E K P YKCPECGKSFSXKSXLVXHQRTH 

25 

T G E K P YKCPECGKSFSXKSXLVXtiQRTH. 

where X is any amino acid 

30 13. The set of libraries of proteins as claimed in any one of claims 

7 to 12, which set results from expression of the set of libraries of genes as 
claimed in any one of claims 1 to 6. 
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14. A set of libraries of genes which code for the set of libraries of 

proteins defined in any one of claims 7 to 12. 



15. A method of identifying a protein which interacts with a 
5 specific binding partner, which method comprises providing a set of 

libraries of proteins as defined in any one of claims 7 to 13, incubating the 
specific binding partner with each library of the set, observing specific 
binding interactions with certain libraries of the set, and using the 
observations to identify a protein which interacts with the specific binding 
10 partner. 

16. The method as claimed in claim 15, wherein the specific 
binding partner is a polynucleotide. 

15 17. The method as claimed in claim 15, wherein the specific 

binding interactions are observed by radiometric or luminescent assay. 

18. The method as claimed in claim 15, wherein the specific 
binding interactions are observed by imaging means. 

20 

19. The method as claimed in claim 15, wherein the specific 
binding interactions are observed by scintillation proximity assay. 



20. The method as claimed in claim 19, wherein the sets of 
25 libraries of proteins are immobilised on scintillation proximity assay 

surfaces and the specific binding partner is radiolabelled. 

21 . The method of claim 19 or claim 20, wherein after incubation 
the scintillation proximity assay surfaces are washed to distinguish stronger 

30 specific binding interactions from weaker ones. 
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22. The method as claimed in claim 15, wherein the specific 
binding interactions are observed by colorimetric means. 

23. The method as claimed in claim 22, wherein the specific 
5 binding partner is biotinylated and the specific binding interaction is 

detected using a signal generating streptavidin conjugate. 

24. The method as claimed in claim 22 or claim 23 wherein after 
incubation the binding interactions are washed to distinguish stronger 

10 specific binding interactions from weaker ones. 

25. A protein having the sequence (SEQ ID NO: 1) 

T G E K P YKgPEQGKSFS. KSk^LV^HORTH 

15 

T G E K P YKQPEQGKSFS^ KS*LV$HQRTU 

T G E K P YKCPEQGKSFS. KS-=>LV$HQRTti. 

20 25. A gene which codes for the protein of claim 24. 

26. A method of constructing randomised gene libraries in which 
the number of genes is the same as the number of encoded proteins and 
which contain no termination codons at the predetermined positions of 

25 randomisation, the method comprising the steps of: 

a) providing a template oligonucleotide which is fully randomised 
at one or more predetermined codon positions; 

b) for each predetermined codon position providing a pool of 
selection oligonucleotides, wherein each member of said pool contains a 

30 different codon selected from the group consisting of 
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AAA. AAC, ACC, AGC, ATG, ATT, GAG, CAT, CCG, CGC, CTG, GAA, 
GAT, GCG, GGC, GTG, TAT, TGG, TGC, TTT. 

at the predetermined codon position; 
5 c) selecting one or more selection oligonucleotides from each 

pool in order to encode the required gene or library; 

d) allowing the selected selection oligonucleotides from each 
pool to hybridise with the template oligonucleotide; 

e) forming one or more constructs by ligating the hybridised 
10 selection oligonucleotides together; 

f) removing a region from a gene of interest corresponding to 
the hybridised product from step e); 

g) forming a gene or library of genes by ligating the products 
from step e) into the said gene of interest wherein the said gene of interest 

15 is contained within a suitable expression vector. 

27. A method of producing proteins encoded by the randomised 

gene libraries of claim 26 comprising the steps of: 

a) transforming a suitable host cell with the gene or gene 
20 library of claim 26 construct; 

b) expressing the genes to form proteins; 

c) purifying the proteins. 
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